Markov Models of Typical Systems 

Absorbing State 

- Simple Triplex System with fail-stop behavior 

- Assumptions: 

no repair 

perfect fault coverage 

■ recall: fault coverage is a measure of the systems ability to detect faults and 
recover 

homogeneous components 
independent failures 
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Markov Models of Typical Systems 


Different A/s 

- e.g. hot + cold spares 

typical MTTF(cold) = 10 MTTF(hot) 

- notation: h.c 
h = number of hots 
c = number of colds 

- assume perfect coverage 

- assume switching mechanism 
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Markov Models of Typical Systems 

Passive TMR with 2 Failure Modes 

- fail passive => processor just disconnects 

- notation: n.f 

n = number of non-faulty processors running 
f = number of faulty processors running 

- assume different fail rates 

failure mode 1: benign fail rate A, stop 
failure mode 2: single non-benign fail rate X err 
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Sharpe & Markov Chains 


2011 A.W.Krings 


Here we used SHARPE to determine 
the unreliabilities. 

The main slide of interest is the last 
one that contains the probabilities of 
being in the specific states. 

Why is this interesting? Well... 
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Sharpe & Markov Chains 

SHARPE extracts model and analysis type: 

- Cyclic vs. Acyclic Model 

- Steady-State vs. Transient Analysis 

markov model_name 
{paramjist} 

from to transition_rate 

<name name expression> 

end 

initial state probabilities 

<name expression> 

end 
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Markov Model 

- P(0) = 0 is assumed for any state not listed in initialization 

- The sum of all P(0) must be 1 

Beware of round-off errors 

- Initial state probabilities section may be left empty if: 

Acyclic model with only 1 source state 

■ Assumes P(0) = 1 for that state 

Irreducible model Steady State analysis 

■ in this case initial conditions are irrelevant 

- Advice: Always specify initial probabilities 
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Useful functions 

tvalue (t; model_name, state; argjist) 

- Gives Transient Probabilities at time t. 

- If no state is given: 

computes transient prob. of being in an absorbing state at time t 

there can be more than one absorbing state 
■ => prob. of being in any absorbing state. 

- If state is given: 

computes transient prob. of being in that state at time t 
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Useful functions 

prob ( model_name, state {;arg_list}) 

- Gives Steady State Probabilities (no time param) 

- Note: state parameter is not optional 

- With absorbing states this computes the steady state probability of 
ever visiting a specific state 

- If no absorbing states exist (irreducible chain), the steady state 
probability of being in a specified state is computed 
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Interpretation ofF(t) 

If a non-absorbing state is specified: 

- F(t) is the transient or steady state CDF for that state 

If an absorbing state is specified: 

- F(t) is the CDF to absorbing by that state 

- absorbing state normally indicates a specific failure mode 

If no state is specified: 

- F(t) is the CDF to include all absorbing states 

- i.e. it is the sum of all CDFs of individual absorbing states 

- e.g. indicating system failure 
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TMR example 

* SYSTEM: TMR.2MODE ~ PASSIVE TMR WITH: FAIL-STOP AND FAIL-ACTIVE MODES 

* MODELS: MARKOV (ACYCLIC) 

* STATE NOTATION: "N.F" WHERE: 

* N == NUMBER OF NON-FAULTY PROCESSORS RUNNING. 

* F == NUMBER OF FAULTY PROCESSORS STILL RUNNING. 

* 

* -MODEL DEFINITIONS 

MARKOV tmr_2mode 

* 

3.0 2.0 3*LAMstop 

3.0 2.1 3*LAMerr 

* 

2.01.0 2*LAMstop 

2.01.1 2*LAMerr 

* 

2.12.0 l*LAMstop 

2.11.1 2*LAMstop 

2.11.2 2*LAMerr 

* 

1.0 0.0 l*LAMstop 
1.0 0.1 l*LAMerr 
END 

* -INITIAL CONDITIONS (START IN 3.0) 

3.0 1.00 

END 
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TMR example 
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* -PARAMETER BINDING 

BIND 

LAMBDA l*10 A -4 
LAMstop 0.9*LAMBDA 
LAMerr 0.1*LAMBDA 
END 

* -ANALYSES AND EVALUATIONS 

cdf (tmr_2mode) 

cdf (tmr_2mode,0.0) 

var failOl value(100.0;tmr_2mode,0.1) 
var failll value(100.0;tmr_2mode,l.l) 
var faill2 value(100.0;tmr_2mode,l-2) 
var failrun failOl + failll + faill2 
var failstop value(100.0;tmr_2mode,0.0) 
var failall failrun + failstop 


expr failOl 
expr failll 
expr faill2 
expr failrun 
expr failstop 
expr failall 
END 
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TMR example 


CDF for system tmr_2mode: 


1.0000e+00 t( 0) exp( 0.0000e+00 t) 
+ -2.5579e+00 t( 0) exp(-1.0000e-04 t) 
+ 2.4000e+00 t( 0) exp(-2.0000e-04 t) 
+ -2.8421e+00 t( 0) exp(-2.9000e-04 t) 
+ 2.0000e+00 t( 0) exp(-3.0000e-04 t) 


failOl: 7.9815e-08 


failll: 5.3038e-05 


mean: 1.6713e+04 
variance: 1.3541e+08 


fai!12: 2.9416e-06 


information about system tmr_2mode node 0.0 


failrun: 5.6059e-05 


probability of entering node: 7.5414e-01 


failstop: 7.1833e-07 


conditional CDF for time of reaching this absorbing state 


1.0000e+00 t( 0) exp( 0.0000e+00 t) 


failall: 5.6778e-05 


+ -3.0526e+00 t( 0) exp(-1.0000e-04 t) 

+ 3.2222e+00 t( 0) exp(-2.0000e-04 t) 

+ -1.1696e+00 t( 0) exp(-2.9000e-04 t) 

mean: 1.8448e+04 
variance: 1.3689e+08 

© 2011 A.W. Krings Page: 17 CS449/549 Fault-Tolerant Systems Sequence 10 














